The article discusses the evolution of model inference techniques from 2017 to a projected 2025, highlighting the progression from simple frameworks like Flask and FastAPI to more advanced solutions like Triton Inference Server and vLLM. It details the increasing demands on inference infrastructure driven by larger and more complex models, and the need for optimization in areas like throughput, latency, and cost.
DeepMind introduces Ithaca, a deep neural network that can restore damaged ancient Greek inscriptions, identify their original location, and help establish their creation date, collaborating with historians to advance understanding of ancient history.
This blog post details the training of 'Chess Llama', a small Llama model designed to play chess. It covers the inspiration behind the project (Chess GPT), the dataset used (Lichess Elite database), the training process using Huggingface Transformers, and the model's performance (Elo rating of 1350-1400). It also includes links to try the model and view the source code.
A detailed comparison of the architectures of recent large language models (LLMs) including DeepSeek-V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, and Kimi 2, focusing on key design choices and their impact on performance and efficiency.
This article discusses the history of AI, the split between neural networks and symbolic AI, and the recent vindication of neurosymbolic AI through the advancements of models like o3 and Grok 4. It argues that combining the strengths of both approaches is crucial for achieving true AI and highlights the resistance to neurosymbolic AI from some leaders in the deep learning field.
This tutorial introduces the essential topics of the PyTorch deep learning library in about one hour. It covers tensors, training neural networks, and training models on multiple GPUs.
This book covers foundational topics within computer vision, with an image processing and machine learning perspective. It aims to build the reader’s intuition through visualizations and is intended for undergraduate and graduate students, as well as experienced practitioners.
This survey paper outlines the key developments in the field of Large Language Models (LLMs), such as enhancing their reasoning skills, adaptability to various tasks, increased computational efficiency, and ability to make ethical decisions. The techniques that have been most effective in bridging the gap between human and machine communications include the Chain-of-Thought prompting, Instruction Tuning, and Reinforcement Learning from Human Feedback. The improvements in multimodal learning and few-shot or zero-shot techniques have further empowered LLMs to handle complex jobs with minor input. They also manage to do more with less by applying scaling and optimization tricks for computing power conservation. This survey also offers a broader perspective on recent advancements in LLMs going beyond isolated aspects such as model architecture or ethical concerns. It categorizes emerging methods that enhance LLM reasoning, efficiency, and ethical alignment. It also identifies underexplored areas such as interpretability, cross-modal integration and sustainability. With recent progress, challenges like huge computational costs, biases, and ethical risks remain constant. Addressing these requires bias mitigation, transparent decision-making, and clear ethical guidelines. Future research will focus on enhancing models ability to handle multiple input, thereby making them more intelligent, safe, and reliable.
This paper demonstrates that the inference operations of several open-weight large language models (LLMs) can be mapped to an exactly equivalent linear system for an input sequence. It explores the use of the 'detached Jacobian' to interpret semantic concepts within LLMs and potentially steer next-token prediction.
This article demonstrates how to use the attention mechanism in a time series classification framework, specifically for classifying normal sine waves versus 'modified' (flattened) sine waves. It details the data generation, model implementation (using a bidirectional LSTM with attention), and results, achieving high accuracy.